Reengineering the specificity of the highly selective Clostridium botulinum protease via directed evolution

The botulinum neurotoxin serotype A (BoNT/A) cuts a single peptide bond in SNAP25, an activity used to treat a wide range of diseases. Reengineering the substrate specificity of BoNT/A’s protease domain (LC/A) could expand its therapeutic applications; however, LC/A’s extended substrate recognition (≈ 60 residues) challenges conventional approaches. We report a directed evolution method for retargeting LC/A and retaining its exquisite specificity. The resultant eight-mutation LC/A (omLC/A) has improved cleavage specificity and catalytic efficiency (1300- and 120-fold, respectively) for SNAP23 versus SNAP25 compared to a previously reported LC/A variant. Importantly, the BoNT/A holotoxin equipped with omLC/A retains its ability to form full-length holotoxin, infiltrate neurons, and cleave SNAP23. The identification of substrate control loops outside BoNT/A’s active site could guide the design of improved BoNT proteases and inhibitors.


Results and discussion
To screen for substrate specificity, depolarization after resonance energy transfer (DARET) 25,26 was used to measure LC/A cleavage of SNAP23 and SNAP25. Here, > 70 residues of each substrate were sandwiched between GFP and BFP (shown schematically in Fig. S2). This assay features exceptionally low background compared to a FRET-only assay and increased polarization upon proteolysis 25 . Achieving consistent growth of individual LC/A variants overexpressed in deep-well microtiter plates required high-speed shaking at 900 rpm for more consistent aeration, along with optimization of the growth media and the fermentation time and temperature to favor LC/A expression while limiting its proteolysis (Materials and Methods). During each round of directed evolution, normalizing the initial rate of proteolysis for SNAP23 (V 0 23 ) to the analogous rate for SNAP25 (V 0 25 ) yielded a substrate specificity index, which accommodated variation in enzyme concentration (Fig. 1C). The assay proved effective at revealing even small gains in substrate specificity.
Error-prone PCR of the gene encoding qmLC/A in Round 1 uncovered three variants with potentially improved SNAP23 specificity: N240S, E201D/D203V, and K364R/Y387N. The latter two variants exhibited inconsistent SNAP23 specificities in triplicate follow-up screens, perhaps due to protein insolubility. Only N240S qmLC/A yielded a consistent increase in SNAP23 specificity compared to qmLC/A. Additional site-saturation at this position in Round 2 identified N240A as improving SNAP23 specificity approximately 1.3-fold or > 1σ above the mean specificity for qmLC/A data at the same screening dilution (σ = standard deviation) ( Fig. 2A). This residue exists in a semi-ordered loop approximately 6 Å away from the beta hairpin of the beta exosite, an important contributor to binding SNAP25.
Subsequent rounds of directed evolution were guided by rational design, including inspection of the LC/A-SNAP25 co-crystal structure (Fig. S3) 27 and prior work 23,24 . Rounds 3 to 7, for example, included residues with high B-factors, which can be productive sites for substitution 28 . Investigating these sites with an iterative mutagenesis method 29 improved specificity for SNAP23 up to 80-fold over qmLC/A in the DARET screens (B) Co-crystal structure (PDB: 1XTG 27 ) of LC/A (white) and SNAP25 (dark gray). Eight substitutions in LC/A drive SNAP23 specificity (teal) through substrate control loops (pink) alongside prior substitutions (light gray). (C) Platform for the directed evolution of SNAP23 substrate specificity. 1. Random or site-directed mutagenesis (e.g., site-saturation); 2. QC by DNA sequencing; 3. High-throughput protein production; 4. Measure V 0 23 and V 0 25 for substrate specificity; 5. Confirmation screens. The most specific and consistent variant from the DARET assay entered the next round of directed evolution. (D) Sequence alignment of substrates used for screening (UniProt: P60880, O00161). The SNAP binding exosites in LC/A (residue numbers above) and cleavage site (scissors) are shown. The gradient of color indicates homology from identical (white, *), to strongly similar (light gray, :), weakly similar (teal, .), or dissimilar (dark teal, space).  Fig. 2A). In each round, Q pool analysis 30 was used to assess library quality, and the TopLib program estimated the numbers of variants required for screening 31 . Later rounds of directed evolution identified a sensitivity to the screening and assay conditions. In early rounds, variants were diluted and screened in a "salt-free" assay buffer (50 mM HEPES pH 7.4, 0.05% Tween with no added salt) chosen to maximize LC/A cleavage activity in vitro 25 . These screening conditions yielded LC/A variants that cleaved SNAP23 at over three times the rate of SNAP25 cleavage ( Fig. 2A). However, the presence of salt reversed these gains in SNAP23 specificity for the selectant from Round 7 (Fig. 2B, Fig. S4). The results vividly illustrate the First Law of Directed Evolution, "you get what you screen for" 32 , as screening took place in salt-free buffer.
Since we sought protease variants having specificity for SNAP23 within the salty environment of cells, we addressed this issue with a final round of mutagenesis and screening in salt buffer. During earlier rounds, positions Q29 and N53 were mutated to positively charged Arg, introducing a potential source of salt sensitivity. Therefore, the two sites were revisited with 19 AA substitution libraries and screened in a salt-containing buffer (50 mM KH 2 PO 4 , pH 7.4). Screening these libraries revealed that substituting His at residue 53 confers SNAP23 specificity to LC/A even in salt buffer, improving screening specificity 1300-fold compared to qmLC/A (Figs. 2B,C, S4). This selectant was termed the octuple mutant LC/A (omLC/A).
Prior work attributed substrate specificity in LC/A to primarily the alpha and beta binding exosites 22,27,33,34 . These binding sites comprise the first and last interactions between substrate and protease, respectively. The results presented here identify a new series of residues between these exosites exerting control over SNAP23/  Fig. S1, the substrate specificity of the qmLC/A variant is sensitive to screening dilution; therefore, each point represents the variant's average specificity normalized to the qmLC/A average specificity at the corresponding screening dilution. Advancement to the next round (solid line) weighed specificity index, protein solubility, and stability. www.nature.com/scientificreports/ SNAP25 specificity. We term these substrate control loops 20 (LC/A residues 26 to 29) and 50 (residues 52 to 56) (Fig. 1B,D). Two categories of substitutions in these substrate control loops improved SNAP23 specificity. First, the enzyme's overall charge became more positive. For instance, the loss of E55 and E56 in control loop 50 increased SNAP23 specificity up to twofold (Fig. 2C). Similarly, substituting positively charged Arg residues at positions in both control loops (A27, Q29, N53) improved SNAP23 specificity up to two-fold for each substitution. Increasing the positive charge in substrate control loop 20 likely accommodates D191 and D198 in SNAP23 (Fig. S5). Indeed, mutating SNAP25 residues to SNAP23 residues at these sites (E183D, T190D) reduces cleavage by wtLC/A up to 60% compared to cleavage of the native SNAP25 22 .
The second category of substitutions conferring greater SNAP23 specificity in these substrate control loops focuses on sidechain size. For example, substituting a small, hydrophilic serine for N26 in Round 3 increased specificity by nearly five-fold compared to the best Round 2 variant (Fig. 2C). Conversely, mutating the neighboring A27 to the larger Leu sidechain increased specificity 2.4-fold from Round 6 to Round 7. Position 26 could directly contact the SNAP23 substrate, whereas position 27 is directed towards control loop 50 (Fig. S5). Changing the size and flexibility of these residues therefore may reshape this SNAP-LC/A binding region to favor SNAP23 binding over SNAP25.
In addition to the discovery of substrate control loops 20 and 50 in LC/A, directed evolution revealed new information on established binding sites. At the alpha exosite, the SNAP25 residues in contact with LC/A are highly homologous to those in SNAP23, which suggests both substrates bind in a similar manner to this region. Indeed, generating libraries at alpha exosite residues in LC/A, such as K337, yielded few variants with proteolytic activity, and none with SNAP23 specificity (Fig. S6). Targeting residues near or within the beta exosite, however, was more productive. For example, libraries at N240 and S254 generated variants with up to three-fold improved SNAP23 specificity (Fig. 2C).
Comparing the enzyme kinetics for omLC/A and qmLC/A reveals the basis for its improved specificity. The main driver for omLC/A's improved SNAP23 specificity is a 20-fold reduction in catalytic efficiency (k cat /K M ) for cleaving SNAP25 (Table 1, Figs. 3A,B, S7). Additionally, omLC/A improves the catalytic efficiency for cleaving SNAP23 by sixfold. Notably, our directed evolution strategy assessed both SNAP23 and SNAP25 cleavage activities, driving these trends. The catalytic rate (k cat ) for hydrolysis of SNAP25 is 40-fold lower for omLC/A compared to qmLC/A. In line with previous studies 27,35,36 , the eight added mutations in omLC/A impact catalytic rate more than substrate affinity. Overall, the results represent a 120-fold improvement in SNAP23-specific catalytic efficiency from qmLC/A to omLC/A. The in vitro screens applied here offer an imperfect model for the human SNAP23 target. In cells, both SNAP23 and BoNT/A are membrane-associated 42 , which could dramatically affect the enzyme's activity; these cellular conditions could diminish the importance of K M and promote k cat as more important for LC/A engineering.
Interestingly, the omLC/A variant exhibits strong dependence on Zn 2+ for its substrate specificity, but not its activity. Without added Zn 2+ during purification, the enzyme remains proteolytic, presumably due to endogenous Zn 2+ , but has a higher rate of cleavage for SNAP25 than SNAP23. The addition of Zn 2+ (2 µM) during protein isolation reverses this specificity, and restores the enzyme's preference for SNAP23. Importantly, once treated with Zn 2+ , the enzyme remains specific for SNAP23 (Fig. S8).
We next determined how omLC/A recognizes and cleaves the full-length SNAP23 (fl-S23) protein using mass spectrometry. Loss of the eight C-terminal residues of fl-S23 plus water (RAKKLIDS) in a cleavage product with a putative charge state of [M + 2H] 2+ is observed upon treatment with omLC/A (Fig. 3C, Fig. S14). This cleavage site in fl-S23 aligns with the expected SNAP25 site for wtLC/A hydrolysis (Fig. 1D). Therefore, the stringent cut-site specificity of this class of proteases is preserved in our engineered omLC/A variant for SNAP23; indeed, omLC/A fails to cleave the C-terminal domain of SNAP29 (Fig. S9), which is highly homologous to both SNAP23 and SNAP25 (Fig. 1A). The results suggest omLC/A interrogates its substrates and binds similarly to wtLC/A binding SNAP25 (as shown in Fig. S5).
We next incorporated omLC/A into the full-length BoNT/A holotoxin, termed omBoNT/A. In the experiments reported here, purified omLC/A and omBoNT/A exhibited comparable stability and purity (Fig. S11) to wild-type LC/A and BoNT/A, respectively. Note that the omBoNT/A reported here has a C-terminal His 6 -tag, is "scar-free," and lacks any sequences required for ligation-mediated assembly. Recombinant expression of omBoNT/A yielded a soluble 150 kDa polypeptide; as expected, reduction by DTT released the 100 kDa (HC/A) and the 50 kDa (omLC/A) proteins (Fig. 3D, Fig. S12A). Release of BoNT LC proteases through disulfide bond reduction is vital to successful cytosolic delivery and SNAP cleavage 6 . The omBoNT/A retains its SNAP23 cleavage activity in vitro and in cellulo (Figs. 3E,F, 12B-C, S13). Additionally, the omBoNT/A exhibits at least 25-fold www.nature.com/scientificreports/ reduced muscle paralysis associated with SNAP25 cleavage compared to wtBoNT/A (Fig. S10A); the residual paralysis observed demonstrates successful cellular delivery of active omLC/A into motor nerve terminals. Furthermore, no systemic toxicity effects were observed for mice treated with 5 ng/kg omBoNT/A (Fig S10B). Additional engineering of the omLC/A holotoxin is needed to enable efficient uptake into non-neuronal cells for useful SNAP23 cleavage activity in vivo. This report demonstrates the plasticity of both wtLC/A's hyperspecific substrate preferences and folding of the full-length, non-ligation-mediated holotoxin. omLC/A preserves wtLC/A's vaunted selectivity for cleaving a single targeted peptide bond with no additional cleavage products observed by MS. Since the belt acts as a pseudosubstrate, incorporation of multiple, specificity-controlling mutations near and in the binding cleft could also disrupt proper folding and function of the holotoxin. However, the eight mutations of omLC/A do not interfere with the omBoNT/A's ability to fold, bind, and then release its peptide "belt, " as required for the holotoxin's successful assembly and function; the result validates the approach of focusing substrate-tailoring efforts on the LC domain.
The SNAP25 substrate wraps around nearly the entire circumference of LC/A, complicating rational attempts to alter substrate specificity. We address this with a novel fluorescence polarization-based platform that expands substrate screening from peptide-sized to include an entire domain of SNARE proteins. The approach offers general applicability, and can be applied to non-BoNT proteases. However, the SNARE proteins are well-suited to the extreme specificity testing demanded by the BoNT family proteases; for example, the SNAREs are structurally malleable. For non-BoNT proteases, which might not feature extremely long half-lives, catalytic rates should be considered with the substrate specificity index to identify the most active and selective proteases.
The advances reported here include a robust, readily implementable high-throughput assay system for evolution of LC/A, which uncovered substrate control loops. Furthermore, in cellulo and in vivo (mice) application of an engineered, full-length omBoNT/A holotoxin accomplish key preclinical milestones along the pathway for drug development. Together, the experiments demonstrate the compatibility of omBoNT/A's mutations in the context of the full-length holotoxin and provide a blueprint for redirecting the enzyme's vaunted substrate specificity.
With respect to general lessons for protease engineering, conventional approaches typically use a short substrate analog as a proxy for cleavage of the target substrate. A short substrate, however, would likely fail to identify the substrate control loops uncovered here. Furthermore, the exceptionally low background of the DARET assay proved critical to the success of this project, as the minimal background allowed high-throughput screening of crude, cell-lysates; assay development for screening should focus on these criteria initially and then on identifying conditions as similar to the target environment as possible. For example, the reported experiments illustrate the unexpected sensitivity of LC/A variants to Zn 2+ and salt concentration in the absence of such screening criteria.
Our data suggests avoiding the alpha exosite for reengineering substrate specificity and focusing on the substrate control loops uncovered here. Notably, substrate control loops 20 and 50 are structurally conserved amongst BoNT serotypes, yet feature divergent sequences (Fig. 4), which suggests their importance to directing the diverse substrate specificities of these proteases. However, pan-BoNT/A inhibitors (e.g., for anti-botulism therapies) could target the alpha exosite due to its recognition of a highly conserved region of SNAP substrates. Therefore, the data reported here could guide engineering of both catalysts and inhibitors.
BoNT represents a unique biotherapeutic as it catalytically ablates its target. Such capabilities are becoming increasingly important with engineered proteins, including proteases, emerging in pharmaceutical pipelines 1 . We hope this work inspires new uses for BoNT proteases to treat a wide range of diseases.

Materials and methods
Construction of the qmLC/A encoding plasmid. The gene encoding the qmLC/A variant (E148Y, K166F, S254A, G305D) was constructed through overlap extension PCR using the gene encoding wild-type LC/A (WT, amino acids 1 to 430) as the template. The resulting PCR products were DpnI-digested, column purified and concentrated, then ligated and subcloned via Gibson Assembly into a pET-29b(+) vector featuring a C-terminal His 6 tag for recombinant protein expression and purification. Sanger sequencing (Genewiz LLC) confirmed construction of qmLC/A-pET-29b(+).

Batch expression and purification of LC/A variants. The LC/A-pET-29b(+) WT and mutant plasmids
were transformed into BL21 (DE3) E. coli (One Shot™ Star™, ThermoFisher) cells and spread on LB kan agar plates at 37 °C overnight. A single colony was used to inoculate an LB kan seed culture incubated for 6 to 18 h at 37 °C with shaking at 225 rpm. The LB kan expression cultures were supplemented with glucose (1% w/v) and inoculated with seed culture (1:1000), then incubated at 37 °C, with shaking at 225 rpm until the culture reached an OD 600 of 0.6. The LC/A protein production was induced by addition of isopropyl β-D-1thiogalactopyranoside (IPTG, 1 mM final concentration) and incubated at 25 °C with shaking at 225 rpm for 20 to 24 h. The cells were harvested by centrifugation (6084×g, 4 °C, 20 min) and resuspended while on ice in LC Lysis Buffer (100 mM HEPES, 25 mM imidazole, 500 mM NaCl, pH 7.4) for storage at − 80 °C or immediately subjected to lysis. The soluble protein was extracted from cells via sonication followed by centrifugation (26,891×g, 4 °C, 45 min), and the supernatant then batch bound overnight at 4 °C to nickel-charged IMAC resin (Profinity, BioRad) which had been equilibrated with LC Lysis Buffer. The LC/A variants were purified via a gravity column, which was first washed with 20 column volumes (c.v.) of LC Lysis Buffer, then 12 c.v. LC Lysis Buffer supplemented with 50 mM imidazole, and finally eluted with 16 c.v. LC Lysis Buffer supplemented with 500 mM imidazole, collecting four 4-mL fractions. The fractions were analyzed by SDS PAGE (12% acrylamide) and visualized by Coomassie blue stain. LC/A-containing fractions with a purity of 95% or higher were combined and dialyzed into chilled Storage Buffer at 4 °C. After sterile filtration through a 0. 22 Fig. S12 and S13).  Construction, expression, and purification of SNAP29 substrate. SNAP29 DARET was constructed by inserting a gene fragment of human SNAP29 (amino acids 185 to 258, Genewiz) between eGFP and eBFP2 via overlap extension PCR, then cloning into pET-28c(+) through Gibson assembly. The SNAP29 DARET-pET28-c(+) construct was confirmed by Sanger sequencing (Genewiz). The SNAP29 DARET-pET-28 plasmid was transformed into BL21 (DE3) E. coli cells and spread on an LB kan agar plate then incubated at 37 °C overnight. The SNAP29 DARET substrate was expressed as described for the SNAP23 and SNAP25 DARET substrates with minor alterations (OD 600 of 0.8, 0.25 mM final IPTG). Purification and storage of the SNAP29 substrate were performed as described for SNAP23 and SNAP25 DARET substrates.
Library generation. The qmLC/A encoding plasmid served as a template for the first round of mutagenesis, and the qmLC/A provided a positive control and standard for each screen of SNAP23/25 specificity. The www.nature.com/scientificreports/ PCR products were DpnI-digested, column purified and concentrated, and then subcloned into pET29b(+) via Gibson Assembly and evaluated by Sanger sequencing. The error-prone PCR (epPCR) library of the gene encoding qmLC/A was created with the GeneMorph II kit (Agilent) according to the manufacturer's instructions. The Round 1 epPCR library yielded the following mutants having SNAP23/SNAP25 specificity greater than qmLC/A: E201D/D203V, N240S, K364R/Y387N. Site-directed mutagenesis libraries were generated with overlap extension PCR using oligonucleotides featuring degenerate sequences at selected residues. In Round 2, the saturated amino acid substitution library (20 amino acids) was constructed using Tang's mutagenesis method 37 at position 240. In Rounds 3 to 6 and part of Round 7, iterative saturation mutagenesis using an NDT codon (encoding C, D, F, G, H, I, L, N, R, S, V, and Y) was employed at residues N26, A27, G28, Q29, M30, T52, N53, E55, E56, E171, S143, Q162, M253, T327, and K337. These rounds yielded the N26S, Q29R, N53R, and E55V variants. In the second part of Round 7 and the entirety of Round 8, libraries featuring NDT and VHG codons (substitution with all AAs except Trp) were constructed at residues Q29, N53, Y148, F166, A254, and D305. These rounds yielded the N53H, A254L, and D305G variants. Each saturation library was subject to Q pool analysis for quality control 30 ; libraries with a Q pool < 0.7 were subcloned again from PCR fragments and further analyzed before screening.
Library expression, harvest, and preparation for screening. DARET assay to screen for SNAP23/SNAP25 specificity. Recombinantly expressed and purified SNAP25 DARET substrate was diluted to 3 µM in Assay Buffer or Intracellular Buffer, then 100 µL substrate added to each well of a 96-well flat, black, non-binding surface microtiter plate (Corning). From the diluted lysate plate, 50 µL of the blank was added to its corresponding well in the black plate and used to optimize the gain and Z position of a Spark fluorescence polarization plate reader (Tecan). The sample was excited with polarized light at 380 nm with a 20 nm bandwidth, written here as 380 (20) nm, and the polarized emission detected at 535 (25) nm. For the remaining 95 wells, 50 µL from each well of the lysate dilution plate was added to the black plate containing substrate and the entire plate screened kinetically for 50 min to 14 h at 28 ± 1 °C. The assay steps were then repeated for 3 µM SNAP23 DARET using the same lysate dilution plate. The final concentration of each substrate screened was 2 µM. The changes in polarization over time were visualized using Prism (Graph-Pad) and initial rates (V 0 ) were derived from fitting trendlines to the initial, linear portion of the raw data. The rates of negative controls (no enzyme) for each DARET substrate were subtracted from the rates of each variant to account for nonenzymatic changes in polarization. The specificity indices were calculated via the ratio of SNAP23 and SNAP25 initial rates for each clone according to Eq. (1).
In Rounds 2 to 8, LC/A variants with specificity indices at least 1.5 times higher than qmLC/A or the most specific variant from the previous round of directed evolution were subject to further screening in triplicate. From the glycerol stocks, the variants and controls (qmLC/A, last best variant) were streaked onto LB kan plates for overnight incubation at 37 °C. Three colonies from these streaks were used to inoculate three wells of a seed culture DWP, then three wells of an expression culture DWP for LC/A variant production, harvesting, lysis, and screening as described above. The specificity index of each well was calculated first, then indices averaged together for the same variant. LC/A variants demonstrating consistent, improved specificity over the last best variant (> 1σ, where σ = standard deviation) were selected as starting points for the next round of mutagenesis and screening.

Mass spectrometry.
To determine the SNAP23 cleavage site of the LC/A variants, recombinant full-length SNAP23 (fl-S23) with N-terminal hexa-His and TEVp tags was expressed and purified to > 95% purity by SDS PAGE, then dialyzed into 20 mM Tris pH, 8.0 at 4 °C overnight. For mass spectrometry (MS) assays, fl-S23 was treated with dithiothreitol (7.5 mg/mL) at 80 °C for 10 min followed by iodoacetamide (9 mg/mL) at 25 °C for 1 h in the dark. This treatment functionalized the six free cysteines between the N-and C-terminal domains to prevent inter-and intra-disulfide bonding, thus simplifying any subsequent mass spectra. The functionalized fl-S23 was then incubated overnight at RT with the omLC/A variant at final concentrations of 2 µM substrate, 50 nM protease in Storage Buffer. The negative control was fl-S23 incubated with Storage Buffer. The samples were analyzed by liquid chromatography-electrospray ionization mass spectrometry using a Xevo G2-XS mass www.nature.com/scientificreports/ antibodies for 1 h at RT (goat anti-mouse IRDye 680RD, 1:5000, Li-Cor; goat anti-rabbit IRDye 800CW, 1: 5000, Li-Cor). The Membrane was then scanned using the Odyssey-CLx imaging system (Li-Cor).
Mouse digital abduction (DAS) assay. All procedures and experiments involving animals were conducted in accordance with relevant guidelines and regulations, and approved by Allergan Animal Care and Use Committee (AACUC; approved protocol #225-100,051-2019). The DAS assay was performed as previously reported 38,39 . In summary, female CD-1 mice (Charles River), with an average weight of 30.2 g and age range of 6 − 10 weeks old, were used in this study. The omBoNT/A and purified BoNT/A1 neurotoxin (Metabiologics Inc., referred to as wtBoNT/A) were diluted in 0.5% human serum albumin in 0.9% saline (Fresenius Kabi, 918,620). For the assay, 0.005 mL of each diluted holotoxin were injected in the right gastrocnemius muscle. Three mice per dose (n = 3) were tested in triplicates (N = 3). The DAS, well-being score, and weight were recorded daily for 4 days. The results were plotted using Prism (GraphPad). Each mouse's well-being was scored on a 4-point system (0 = activity level normal; 1 = slightly diminished activity level and/or slight weight loss (5-10%); 2 = moderately diminished activity level and moderate weight loss (10-15%); 3 = severely diminished activity level, little to no reaction to outside stimuli, inability to ambulate, agonal or labored respiration). To the best of our abilities, the studies were carried out in compliance with the ARRIVE guidelines.

Data availability
All data needed to evaluate the conclusions in the paper are present in the paper and/or the Supplementary Materials.